Training of stream weights for the decoding of speech using parallel feature streams

نویسندگان

  • Xiang Li
  • Richard M. Stern
چکیده

In speech recognition systems, information from multiple sources such as different feature streams can be combined in many different ways to yield better recognition accuracy. In general , information may be combined at the level of the incoming feature vectors, at the level of the decoding process, or after hypothesis generation. In this paper we focus on the specific case where parallel streams of features are used simultaneously during search to generate a hypothesis, or a set of hypotheses. In this case the contributions of the individual features to the score associated with a frame of speech must be weighted appropriately during search. In this paper we present an offline data-driven algorithm for determining the weights to be associated with each feature stream for combining acoustic likelihoods for each frame. Experimental results show that the word error rates (WERs) obtained using the proposed algorithm are lower than those obtained using conventional schemes for parallel feature combination.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons

Jointly using audio and video features can increase the robustness of automatic speech recognition systems in noisy environments. A systematic and reliable performance gain, however, is only achieved if the contributions of the audio and video stream to the decoding decision are dynamically optimized, for example via so-called stream weights. In this paper, we address the problem of dynamic str...

متن کامل

Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface

Recent improvements are presented for phonetic decoding of continuous-speech from ultrasound and optical observations of the tongue and lips in a silent speech interface application. In a new approach to this critical step, the visual streams are modeled by context-dependent multi-stream Hidden Markov Models (CD-MSHMM). Results are compared to a baseline system using context-independent modelin...

متن کامل

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR

Automatic speech recognition (ASR) enables very intuitive human-machine interaction. However, signal degradations due to reverberation or noise reduce the accuracy of audio-based recognition. The introduction of a second signal stream that is not affected by degradations in the audio domain (e.g., a video stream) increases the robustness of ASR against degradations in the original domain. Here,...

متن کامل

Combined discriminative training for multi-stream HMM-based audio-visual speech recognition

In this paper we investigate discriminative training of models and feature space for a multi-stream hidden Markov model (HMM) based audio-visual speech recognizer (AVSR). Since the two streams are used together in decoding, we propose to train the parameters of the two streams jointly. This is in contrast to prior work which has considered discriminative training of parameters in each stream in...

متن کامل

Discriminative speaker adaptation using articulatory features

This paper presents an automatic speech recognition system using acoustic models based on both sub-phonetic units and broad, phonological features such as Voiced and Round as output densities in a hidden Markov model framework. The aim of this work is to improve speech recognition performance particularly on conversational speech by using units other than phones as a basis for discrimination be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003